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(57) Abstract: A flow-based intrusion detection system for detecting intrusions in computer communication networks. Data packets 
representing communications between hosts in a computer-to-computer communication network are processed and assigned to var- 
ious client/server flows. Statistics are collected for each flow. Then, the flow statistics are analyzed to determine if the flow appears 
to be legitimate traffic or possible suspicious activity. A concern index value is assigned to each flow that appears suspicious. By 
assigning a value to each flow that appears suspicious and adding that value to the total concern index of the responsible host, it is 
possible to identify hosts that are engaged in intrusion activity. When the concern index value of a host exceeds a preset alann value, 
an alert is issued and appropriate action can be taken. 
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10 FLOW-BASED DETECTION OF NETWORK INTRUSIONS 

Cross Reference to Related Applications 
This Patent Application claims priority to the United States provisional patent 
application serial no. 60/250,261 oititled "System and Method for Monitoring Network 
15 Traffic" filed November 30, 2000 and United States provisional patent plication serial 
no, 60y265,194 entitled "The Use of Flows to Analyze Network Traffic" filed on January 
3, 2001, both of \;sdiich are incorporated in their CTtirefy by reference and made a part 
hereof. t 

20 Reference to Computer Program Listing Submitted on CD 

This application incorporates by reference the computer program listing appendix 
submitted on (1) CD-ROM entitled 'Tlow-Based Engine Computer Program Listing" in 
accordance with 37 CF^R. § L52(e). Pursuant to 37 CJF.R. § 1.77(b)(4), the material on 
said CD-ROM is incorporated by reference herein, said material being identified as 

25 follows: 

Sizein ^ Date of File Name 

Bytes Creation 



30 154,450 November 30, 2001 



LANcope Code.txt 
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trafBc designed b}pass &e bloddng rules established Additionally, almost all 
commercially available IDS are signature based detection systems or anomaly based 
systems. 

Signature based detection systems piece together the packets in a connection to 
5 collect a stream ofbytes being transmitted The stream is then analyzed for certain strings 
of characters in the data commonly referred to as **signatures." These signatures are 
particular strings that have been discovered in known exploits. The more signatures that 
are stored jn a database, the longer it takes to do on exhaustive search on each data stream. 
For larger networks with massive amounts of data transferred, a string comparison 

10 approach is unfeasible. Substantial computing resources are needed to analyze all of the 
communication trafQc. 

Besides, even if a known exploit signature has been discovered, the signature is not 
useful until it is has been installed and is available to the network. In addition, signature 
analysis only protects a system from known attacks. Yet, new attacks are being 

15 implemented all the time. Unfortunately, a signature based detection system would not 
detect these new attacks and leave die network vulnerable. 

Another approach to intrusion detection includes detection of unusual deviation 
from normal data traffic commonly referred to as ^anomalies.'' Like signature-based 
detection systems, roany current anontialy based intrusion detection systems only detect 

20 known methods of attacks. Some of tiiese known anomaly based attacks include TCP/BP 
stack fingeiprinting, half-oi>en attacks, and port scanning. However, systems relying on 
known attacks are easy to circxmmavigate and leave the system vulnerable. In addition, 
some abnormal network traffic happens routinely, often non-ntialiciously, in normal 
network traffic. For example, an incorrectly entered address could be sent to an 

25 unauthorized port and be interpreted as an abnormality. Consequently, known anomaly 
based systems tend to generate an undesirable nimiber of false alarms which creates a 
tendency to have all alarms generated to become ignored. 

Some known intrusion detection systems have tried to detect statistical anomalies. 
The approach is to measure a baseline and then trigger an alarm when deviation is 

30 detected. For example, if a system typically has no traffic from individual workstations at 
2 am, activity during this time frame would be considered suspicious. However, baseline 
systems have typically been ineffective because the small amount of malicious activity is 
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be legitimate traffic or possible suspicious activity. A value, referred to as a "concern 
index," is assigned to each flow that appears suspicious. By assigning a value to each 
flow that appears suspicious and adding that value to an accumulated concern index 
associated with the responsible host, it is possible to identify hosts that are engaged in 
5 intruder activity without generation of significant unwarranted false alarms. When the 
concern index value of a host exceeds a preset alarm value, an alert is issued and 
appropriate action can be taken. 

Generally speaking, the intrusion detection system analyzes network 
commimication traffic for potential detrimental activity. The system collects flow data 

10 from packet headers between two hosts or Internet Protocol (IP) addresses. Collecting 
flow data from packet headers associated with a single service where at least one port 
remains constant allows for more efficient analysis of fhe flow data. The collected flow 
data is analyzed to assign a concern index value to the flow based upon a probability that 
the flow was not normal for data communications. A host list is maintained containing an 

15 accumulated concern index derived fix>m fhe flows associated with the host. Once the 
accumulated concern index has exceeded an alarm threshold value, an alarm signal is 
generated. 

Brief DESCRipnoN of the Drawings 
20 Benefits and fiirther features of the present invention will be apparent Scorn a 

detailed description of preferred embodiment th^of taken in conjunction with the 
following drawings, wherein like elements are referred to with like reference numbers, and 

FIG. 1 is a fimctional block diagram illustrating a flow-based intrusion detection 
25 system constructed in accordance with a preferred embodiment of the present invention. 
FIO. 2 is a diagram illustmting headers of datagrams. 
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Overview 

Turning to the figures, in which like numerals indicate like elements throughout 
the several figures, FIG. 1 provides an overview of a flow-based intrusion detection 
system or engine 155 in accordance with an exemplary embodiment of the present 

5 invention. The flow-based intrusion detection system 155 monitors network computer 
communications. The network computer conunimications are routed via a known global 
computer network commonly known as the Internet 199. In accordance with an aspect of 
the invention, the intrusion detection engine 155 is incorporated into a monitoring 
appliance 150, together with a database 160 that stores information utilized in the intrusion 

10 detection methodology. 

The operating environment of the intrusion detection system 155 is contemplated 
to have numerous hosts connected by the Intemet 199, 6.g. Host #1, Host #2, Host #3 (also 
referred to as H1-H3 respectively). Hosts are any computers that have fiill two-w^ access 
to other compiiters on the Intemet 199 and have their own unique IP address. For example 

15 Host #1 has an exemplary IP address of 208.60.239.19. The Intemet 199 connects clients 
1 10 with a host server 130 in known client/server relationship. 

In a typical configuration, some computers are referred to as "servers', while 
others are referred to as "clients." A server computer such as Host #2 130 typically 
provides responses to requests &om client computers and provides services, data, 

20 resources, and the like. While a client computer such as Host #1 110 typically requests 
and utilizes the services, data, resources, and the like provided by the server. 

It is known in the art to send communications between hosts via the Intemet 199. 
The Intemet Protocol (BP) is the method by which data is sent firom one host computer to 
another on the Intemet 199. Each host on the Intemet 199 has an IP address that uniquely 

25 identifies it firom all other computers. When data, is transmitte4 the message gets divided 
into packets 101. Packets 101 are discussed in more detail in reference to FIG. 2. 

Each IP packet 101 includes a header that contains both the sender's Intemet 
address and receiver's Intemet address. The packets 101 are forwarded to the compxiter 
whose address is specified. Illustrated is a legitimate user/client 1 10, host #1 (HI), with 

30 an IP address of 208.60.239.19 and a server, host #2 (H2), with an BP address of 128.0.0.1. 

As shown, a client 110 communications with a server 130 by sending packets 101 
of data. A packet 101 is a unit of data that is routed between an origin and destination. As 
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host server port 80. File Transfer Protocol (FTP) control commxinications are sent to the 
server port 21, w^hile FTP data transfer originates from port 20. The FINGER service 
utilizes service contact port 79, the domain name service (DNS) utilizes service contact 
port 53, and Telnet communications utilize service contact port 23. As illustrated, 
5 common services are typically associated with specific predetemiined service contact 
ports. 

Also illustrated in FIG. 1 are four flows, Fl through F4, between by client host #1 
110 and service host #2 130, Flow Fl is a file transfer utilizing the File Transfer Protocol 
(FTP). As shown, the file transfer (flow Fl) is delivered by a stream of packets 101 (PI — 
10 P3) that will be reassembled by the receiving host 110. 

After the file transfer is completed, the cli^t i 10 initiates an HTTP Web session 
(flow F2) with server 120. Those skilled in the art understand fiiat a Web session typically 
occurs when an Internet browser computer program such as MICROSOFT INTEKNET 
EXPLORER or NETSCAPE NAVIGATOR requests a web page from a World Wide Web 
15 (WWW) service on port 80. Packets P4, P5, P6, and P9 are associated with the Web 
trafflc of flow F2. These packets may contain data such as a JPG format picture to be 
displayed, text, a JAVA program, or other informational materials to be displayed or 
handled by tiie client's Ihtemet browser program. , 

Continuing the example of FIG. 1, vfbUe the web session of flow P2 is still open, 
20 the client 110 s^t an email illustrated by flow F3. As shown, the email packets of flow 
F3 may be int^leaved with the previously opened Web session of flow F2. As illustrated, 
packets P7, P8, and P12 contam the e-mail message. 

Finally, the client 1 10 requests another web page firom the server 120, initiating yet 
anotho: HTTP flow F4. Packets P9, PIO, Pll, P12, and P14 represent the new Web 
25 traffic. 

In accordance with an aspect of the invention, a flow is considered terminated after 
a predetermined period of time has elapsed on a particular connection or port For 
example, if HTTP Web traffic on port 80 ceases for a predetermined period of time, but 
other traffic begins to occur on port 80 after the expiration of that predetermined time 
30 period, it is considered that a new flow has begun, and the system responds accordingly to 
assign a new flow number and track the statistics and diaracteristics thereof. In the 
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Consequently, abnonnal flows and/or events identified by the intrusion detection 
engine 155 will raise the concern index (CI) for the associated host. The intrusion 
detection engine 155 analyzes the data flow between IP devices. However, different types 
of services have different flow characteristics associated with that service. Therefore, a 
5 C/S flow can be determined by the packets exchanged between the two hosts dealing with 
the same service. 

In accordance with an aspect of the invention; the intrusion detection engine 155 
works by assigning data packets 101 to various flows. The engjne 155 collects 
information about and statistics associated with each flow and stores this information and 

10 statistics in a database 160. The flow database 160 comprises a flow data structure 162 
and a host data structure 166. The flow data structure 162 stores collected flow 
infonnation such as the IP addresses. The ei^ine determines which host has a lower IP 
address and assigns that host IPO. The other host is assigned IPl. PortO is associated with 
IPO and portl is tiie service connection port for hostl. The flow data structure 162 also 

15 stores time and other related packet infonnation derived fix>m the packet header. In the 
disclosed embodiment, this time infomoation (e.g. time of tiie first packet, time of the last 
packet) is utilized to measure the elapse of time for purposes of flow delimiting, as 
described above. 

Tlie intrusion detection engine 155 analyzes the flow data 160 to determine if the 
20 flow appears to be legitimate traJOBc or possible suspicious activity. Flows with suspicious 
activity are assigned a predetemuned concem index (CT) value based upon a heuristically 
predetermined assessment of tiie significance of the threat of the particular traffic or flow 
or suspicious activity. The flow concem index values have been derived heuristically 
fix)m extensive network traffic analysis. Concem index values are associated with 
25 particular hosts and stored in the host data structure 166 (FIG. 1). Exemplary concem 
index values for various exemplary flow-based events and other types of events are 
illustrated in connection with FIGS. 6 and 7. 

By assigning a value to each flow that appears suspicious and adding that value to 
a total CI of the host responsible for the flow, it is possible to identify hosts that are 
30 engaged in intruder activities. When the CI of a host exceeds a preset alarm threshold, a 
alarm signal may be generated. In the example of FIG. 1, host H3 has an accumulated CI 
of 3,980. This exceeds the preset threshold of 3,500 for that network and a system 
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reliance on earlier exchanges between the source and destination computer. Packets 101 
have a header and a data segment as illustrated by FIG. 2. The term "packet" in present- 
day parlance has generally replaced the term "datagram". 

Restated, a packet 101 is the imit of data that is routed between an origin and 

5 destination on a packet-switched network such as the Internet 1 99. A packet-switching 
scheme is an efBcient method of handling transmissions on a connectionless network. 
However, connection-oriented protocols can be utilized to create a session. A session is a 
series of interactions between two communication end points that occur diiring the span of 
a single connection. A detailed discussion of a TCP/IP session is described in reference to 

10 FIG. 3. However, a host can send a message without establishing a connection with the 
recipient That is, the host simply sends a packet 101 onto the network 199 with the 
destination addi^ess and hopes that it axrives. 

FIG. 2 illustrates an exemplary TCP/IP packet or datagram 210 and an exemplary 
UDP datagram 240. In a typical TCP/IP packet like 210, each packet typically mcludes a 

15 header portion comprising an IP header 220 and a TCP header 230, followed by a data 
portion that contains the information to be communicated in the packet The information 
in the IP headCT 220 contained in a TCP/IP packet 210, or any other IP packet, contain^ the 
IP addresses and assures that the packet is delivered to the right host The transport layer 
protocol (TCP) header follows the tntemet protocol header and specifies the port numbers 

20 for the associated service. 

Hie header portion in the typical TCP/BP datagram 210 is 40 bytes including 20 
bytes of EP header 220 information and 20 bytes of TCP header 230 information. The data 
portion or segment associated with the packet 210 follows flie header information. 

In regards to atypical IP packet 210, the first 4 bits of the IP header 220 identify 

25 the Internet protocol (IP) version. The following 4 bits identify flie IP header length in 32 
bit words. The next 8 bits diflferentiate the type of service by describing how the packet 
should be handled in transit The, following 16 bits convey the total packet length. 

Lai^e packets tend to be firagmented by networks that cannot handle a large 
packet size. A 16-bit packet identification is used to reassemble fragmented packets. 

30 Three one-bit set of fragmentation flags control whether a packet is or may be fragmented. 
The 13-bit fragment oflfeet is a sequence number for the 4-byte words in the packet when 
reassembled. In a series offiragments, the first offset will be zero. 
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underlying protocol. The UDP protocol is transaction oriented and delivery protection is 
not guaranteed. Applications requiring reliable delivery of data typically use the 
previously described Transmission Control Protocol (TCP). 

The 16-bit UDP source port is a field to which port a reply, when meaningful, 

5 should be addressed. The 16-bit UDP destination port specifies the server program on the 
receiving host to execute the packet Next, the 16-bit UDP message length field is the 
length in bytes of the user datagram including header and any data. Following the length 
field is the 16-bit checksum of the UDP header, the UDP pseudo header information 250 
fi-om an IP header 220, and the data. 

10 As will be understood by those skilled in fhe art, fhe fimdamental Internet service 

consists of a packet delivery system. Internet service is typically considered 
"coimectionless'^ because each packet is treated independratly of all others. Some 
transport protocols such as UDP provide unreliable service because the delivery of the 
packet is not guaranteed. Other transport protocols such as TCP provide a mechanism to 

15 ensure delivery of a packet and therefore can be used to establish computer-to-comput^ 
••sessions" in the conventional sense of the term. FIG. 3 illustrates atypical TCP/IP 
session and the guaranteed packet delivery mechanism. 

As previously stated, the flow-based engine 155 does not analyze the data 
segments of packets for signature identification. Instead, the engine 155 associates all 

20 packets with a flow. It analyzes certain statistical data and assigns a concern index value 
to abnormal activity. The engine 155 builds a concem index for suspicious hosts by 
detecting susfpicious activities on the network. An alarm is generated ^^en those hosts 
build enough concem (in the form of a cumulated CI value) to cross the network 
administrator's predetermined threshold. 

25 

Session 

Turning next to FIG. 3, a TCP session 300 is a fiill duplex connection that allows 
concurrent transfer of data in both dkections. Before the transfer can start, both the 
shading and receiving application programs interact with their respective operatiDg 
30 systems, informing them of the impending stream transfer. Protocol software 

communicates by sending messages across, verifying that the transfer is authorized, and 
indicating that both sides are ready to receive data. 
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underlying protocol. The UDP protocol is transaction oriented and delivery protection is 
not guaranteed. Applications requiring reliable delivery of data typically use the 
previously described Transmission Control Protocol (TCP). 

The 16-bit UDP source port is a field to which port a reply, when meaningful, 

5 should be addressed. The 16-bit UDP destination port specifies the server program on the 
receiving host to execute the packet Next, the 16-bit UDP message length field is the 
length in bytes of the user datagram including header and any data. Following the length 
field is the 16-bit checksum of the UDP header, the UDP pseudo header information 250 
fix>m an IP header 220, and flie data. 

10 As will be understood by those skilled in the art, the fundamental Internet service 

consists of a packet delivery system. Internet service is typically considered 
''connectionless" because each packet is treated independently of all others. Some 
transport protocols sudi as UDP provide unreliable service because tiie delivery of the 
packet is not guaranteed. Other transport protocols such as TCP provide a mechanism to 

15 ensure delivery of a packet and therefore can be used to establish computer-to-computer 
''sessions" in the conventional sense of the term. FIG. 3 illustrates a typical TCP/IP 
session and the guaranteed packet delivery mechanism. 

As previously stated, Ae flow-based engine 155 does not analyze the data 
segments of packets for signature identification. Instead, the engine 155 associates all 

20 packets with a flow. It analyzes certain statistical data and assigns a concern index value 
to abnormal activity. The engine 155 builds a concem index for suspicious hosts by 
detecting suspicious activities on the network. An alarm is generated whm those hosts 
build enough concem (in the form of a cumulated CI value) to cross the network 
administrator's predetermined tiireshold. 

25 

Session 

Tummg next to FIG. 3, a TCP session 300 is a full duplex connection that allows 
concurrent transfer of data in both directions. Before the transfer can start, both the 
sending and receiving application programs interact with their respective operating 
30 systems, informing them of the impending stream transfer. Protocol software 

communicates by sending messages across, verifying that the transfer is authorized, and 
indicating that both sides are ready to receive data. 
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Alternatively, a host may desire to keep a session active even after if has finished 
sending its current data. If more data is to be sent in flie near future, it is more efiScient to 
keep a session open than it is to open multiple sessions. A session wherem the connection 
is kept open in case future data is to be communicated is typically referred to as a 

5 **persistenf • session. In this scenario, a session is closed by siding a padcet with the reset 
flag (R) set (also called a **reset packet**) after no data is delivered after a period of time. 
Many browser applications provide a 300-second window of inactivity before closing a 
session with an R packet (reset). 

The described TCP session 300 of FIG. 3 is a generic TCP session in vdiich a 

10 netwodc might engage. In accordance with the invention, flow data is collected about the 
session to help determine if the communication is abnormal. In tiie preferred embodiment, 
information such as the total number of packets sent, the total amount of data sent, the 
session start time and duration, and the TCP flags set in all of the packets, are collected, 
stored in the database 160, and analyzed to determine if the communication was 

15 suq>icious. Ifacommumcationisdeemedsuspicious,i.e. it meets predetermined criteria, 
a predetermined concern index value associated witii a determined category of suspicious 
activity is added to the cumulated CI value associated with the host that made the 
communication. 

For example, a TCP/IP packet witii both the S YN flag and the FIN flag set would 
20 not exist in a normal communication. Because a packet with both die S YN and FIN flags 
set is undefined, each operating system handles this packet in different methods. An 
operating system may send an ICMP message, a reset, or possibly just ignore it and send 
nothing. Consequently, an intruder may send a SYN-FIN packet specifically to help 
idmtify the operating system of the targeted host 
25 As another example, if a particular host sends a large niunber of S YN packets to a 

target host and in response receives nimierous R packets firom the targeted host, a potential 
TCP probe is indicated. Likewise, numerous UDP packets sent fix>m one host to a targeted 
host and numerous ICMP '*port unavailable" packets received firom the targeted host 
indicates a potential UDP probe. A stealth probe is indicated by multiple packets firom the 
30 same source port number sent to different port numbers on a targeted host 

As has been described elsewhere, UDP packets are often used in connection with 
streaming media and other applications that provide data to many hosts. A UDP packet 
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services is analyzed separately. Therefore, as defined, the illustrated communications 
represent three distinct flows. 

The first flow illustrated would be Web traffic (HTTP protocol) between the client 
at IP ADDRESS 1 and the server at IP ADDRESSO. The client Web browser opens a 

5 random ephemeral high port (51,1 32) as illustrated in the example. A high port is utilized 
because the low port nvmibers less than 1024 are preassigned for designated services. One 
these designated services is port 80 for HTTP, which transfers displayable Web pages arid 
related files in the known manner. The Web browser sends the request to the server's port 
80. The server port responds by sending the requested Web page data in packets \^erein 

10 the port number in the packets transmitted to the client sets the destination port to 51,132 
of the client All communications by clients utilizing HTTP is sent to port 80 of the 
server. One C/S flow would be the HTTP communications between port 5 1,132 of 
ADDRESSl and port 80 of ADDRESSO. 

A flow is terminated if no communications occur between the two IP addresses and 

15 the one low port (e.g. port 80) for 330 seconds. Most Web browsers or a TCP connection 
send a reset packet (i.e. a packet with the R flag set) if no communications are sent or 
received for 5 minutes. An analysis can determine if the flow is abnormal or not for 
HTTP communications. 

The next flow illustrated is email trafiSc between the client and server utilizing port 

20 25. The client email £Q}plica1ion opens a random high ephemeral port, e.g. port 49,948 as 
illustrated in FIG. 4. The client's email application sends the email utilizing the Simple 
Mail Transfer Protocol (SMTP) to the server's port 25. Port 25 is conventionally 
designated for SMTP communications. A flow is terminated if no communications are 
delivered between the two IP addresses and the low port for 330 seconds. If the client 

25 sends another SMTP ^nail packet or packets within 330 seconds of the of the first 
email to the server, only one flow would exist 

For example, as shown Ia FIG. 4, if a second email packet originating firom the 
ephemeral port 35,620 is sent within 330 seconds, only one flow would exist If the 
second email packet was later than 330 seconds firom the first sent email, it would be 

30 classified as another flow for analysis purposes. An analysis can determine if the flow is 
abnormal or not for SMTP communications. 
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understood that the terms "computer," "operating system,*' and "application program" 
include all types of computers and the program modules designed to be implemented by 
the computers. 

The discussion of methods that follow, especially in the software architecture, is 

5 represented largely in terms of processes and symbolic representations of operations by 
conventional computer components, including a central processing unit (CPU), memory 
storage devices for the CPU, network communication interfaces, connected display 
devices, and input devices. Furthermore, these processes and operations may utilize 
conventional computer components in a heterog^eous distributed computing 

10 environment, including remote file servers, remote compute servers, and remote memory 
storage devices. Each of &ese conventional distributed computing components is 
accessible by the CPU via a communication network. 

The processes and operations performed by tibe computer include the manipulation 
of signals by a CPU, or remote server such as an Internet Web site, and the maintenance of 

15 these signals within data structures reside in one or more of the local or remote memory 
storage devices. Such data structures impose a physical organization upon the collection 
of data stored within a memory storage device and represent specific electrical, optical, or 
magnetic elements. These symbolic representations are the means use4 by those Called in 
the art of computer programming and computer construction to effectively convey 

20 teachings and discoveries to others skilled in the art For tiie purposes oftiiis discussion, a 
process is imderstood to include a sequence of conq>uter-executed steps leading to a 
conorete, useful, and tangible result, namely, the detection of intruders based \ipon C/S 
flows and other activity deemed heuristically to be a threat substantial enough to warrant 
assignment of a concern index value. 

25 These steps generally require manipulations of quantities such as IP addresses, 

packet length, header length, start times, end times, port numbers, and other packet related 
information. Usually, though not necessarily, these quantities take the form of electrical, 
magnetic, or optical signals capable of being stored, transferred, combined, compared, or 
otherwise manipulated. It is conventional for those skilled in the art to refer to these 

30 signals as bits, bytes, words, values, elements, symbols, characters, terms, numbers, 

points, records, objects, images, files or the like. It should be kept in mind, however, that 
these and similar terms should be associated with appropriate quantities for computer 
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client and server network applications that are being operating by the hosts that are 
observed participating in the flows observed (port profiling). 

Packet Classifier 

5 The header data is read by the packet classifier thread S 1 0. The packet classifier 

thread 510 runs whenever new packet iafonnation is available. Based on the source and 
destmation IP addresses, ^e thread SIO searches for an existing flow in the flow data 
structure 162. To facilitate searching and record insertion, a symmetric hash of the two IP 
addresses is generated and used as the index of an array tibat points to the beginning of a 

10 two-waylinkedlistof all flows with that hash value. As known to those skilled in the art, 
a symmetric hash is a mathematical process that creates a probabilistically unique number 
that &cilitates rapid indexing and sorting witiiin a data structure such as flow data 
structure 162. 

Flow processing is done for TCP and UDP packets, and the port numbers in the 
15 tran^rt layer header are used to identify the flow record to be updated. For ICMP 

packets that constitute rgections of a packet, the copy of the rejected packet in fixe ICMP 
data field is used to identify the IP addresses and port numb^ of the corresponding flow. 

For purposes of the description vMch follows, the IP address "vyifii the lower value, 
when considered as a 32-bit imsigned integer, is designated ip[0] and the corresponding 
20 port number is designated pt[0]. The higher IP address is designated ip[l] and the 

corresponduig TCP or UDP port number is designated pt[l]. At some point, eiflier pt[0] or 
pt[l] may be designated the ^'server" port by setting an appropriate bit in a bit map that is 
part of the flow record (record "state", bit 1 or 2 is set). 

If a particular packet 101 being processed by the packet classifier 510 matches a 
25 particular entry or record in the flow data structure 1 62, data jfrom that particular packet 
101 is used to update the statistics in the corresponding flow data structure record. A 
packet 101 is considered to match to a flow data structure record if both IP numbers match 
and: 

a) both port numbers match and no port is marked as the "server" port, or 
30 b) the port number previously marked as the "server*' port matches, or 
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unsigned char flag[2][7];//0 bad, 1 reset, 2 urgent, 3 syn, 4 syn-ack, 5 fin, 6 

firagments, // (counts of packets seen with various TCP flag combinatioDs) 
-7 UDP rejects 

unsigned short scans ; // max number ports seen for ip pair, detects "Port Scans" 
5 } flow[SLOTS] ; 

Notice that many of the fields are counters for each host, e.g., the number of packets and ' 
. bytes sent, the number of packets with various TCP flag-bit combinations sent for TCP 

flows, flie number of ICMP **port-unavailables" for a UDP flow. Also bitmaps can be 
10 filled in, such as the bitmap of all TCP flags seen which has been bitwise OR'ed with flie 

TCP flag field of each TCP packet Data is filled in for the source (originating) host 
The packet classifier fliread SIO also adds some data directly to the host data 

structure 166. Most of fliis data could be added later by the flow collector tiiread 520 

(such as bytes sent by each host), but adding it on a packet by packet basis allows 
15 collection of real time rate information (such as bytes sent in each time interval), lliese 

records are indicated in the host data structure 166 below. 

Host Data Structure • . 

The host data structure 166 accumulates data on all hosts that are observed 
20 participating in a flow. A description of this da^ structure in C language format follows: 

#defineHOST_SLOTS 65537 // number Host slots 
struct hostjdb { 
// data added by the Packet Classifier Thread 
25 unsigned long ip ; //ip address 

unsigned long down ; // linked list index 
unsigned long upi II linked list index 
unsigned long start ; // time host record started 
unsigned long last ; // time of last packet firom this host 
30 unsigned long udp_bytes ; // UDP bytes sent and received 

unsigned long bj^es^in ; // bytes received 
unsigned long bytes_injpp ; // Bytes over last 5 min interval 
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and a logic-tree analysis is done to classify them as either a normal flow, or a potential 
probe or other suspicious activity warranting assignment of a concern index value. 

Normal flows are &ose for which the corresponding statistics indicate a normal 
exchange of infonnation betwem two hosts. The host that initiated the flow is considered 

5 the client (i.e. flie computer that sent TCP SYN packets or sent an initial UDP packet). 
The other host is considered the server (i.e. the computer that sent TC3P S YN-ACK packets 
or responded to a UDP packet). Some data is exchanged during a normal flow. 

A potential probe is a flow that appears to have one host (a possible intruder) 
• . sending packets to gain information about another host (an intoided victim). An example 

10 ofapotentid probe is a flow that has TCP packets ofany sort sent by one host (the 
intmder) and ^proximately the same number of TCP reset packets sdnt by the other. 
Another example is a flow which has UDP packets answered by ICMP "port unavailable*' 
packets. A flow with ICMP "destination uiireachable" packets sent by one host would be 
considered a potential probe being done by the other host 

15 In accordance with the invention, some potential probes are much more likely to 

indicate probing than others are. To handle this, a value called the "concem index'' is 
calculated or otherwise determined for each flow, and this value is added to the concern 
index value being accmnulated in the host data structure 166. Table I pf FIG. 6 shows one 
scheme for assigning concern index values due to the flow analysis. After the flow is 

• 20 analyzed, the flow record is written to the flow log file and then cleared fix>m the flow data 
structure. 

Other Concern Index Increments 

Concem index (CI) values calculated firom packet anonialies also add to a host's 

25 accumulated concem index value. Table n of FIG. 7 shows one scheme for assigning 
concem index values due to other events revealed by the flow analysis. For example, 
there are many combinations of TCP flag bits that are rarely or never seen in valid TCP 
coimections. When one of these combinations is recognized by the packet classifier thread 
510, it directly adds a predetermined value to the sending host's accumulated concem 

30 index value. When the packet classifier thread 510 searches along the flow linked-list (i.e. 
flow data 162) for a match to the current packet 101, it keeps count of the number of flows 
active with matching IP addresses but no matching port number. If this number exceeds a 
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The alert manager 530 also looks for hosts whose CI or traffic (byte rate) exceeds 
preset alarm thresholds and which have not been handled on previous runs. The new 
alarm conditions can cause immediate operator notification by an operator notification 
process 542. These conditions can be highlighted on the user inter&ce^ and cause SNMP 

5 trap messages to be sent to a network monitor such as HP Openview, and/or email 
messages to the network administrator vAAoh in turn may cause messages to be sent to 
beepers or cell phones. Messages cm also be sent to cause automated devices such as a 
firewall manager 544 to drop packets going to or &om an offending host. It will thus.be 
£q>preciated that the present invention advantageously operates in conjunction with 

10 firewalls and other network security devices and processes to provide additional protection 
for an entity's computer network and computer resources. 

Hardware 

A preferred hardware configuration 800 of an embodiment that executes the 
15 fimctions of the above desodbed flow-based engine is desoibed in reference to FIG. 8. 
FIG. 8 illustrates a typically hardware configuration 800 for a network intrusion detection 
. system. A monitoring appliance 150 serves as a pass-by filter of network trafiBc. A 
network device 135, such as a router, switch, hub, tap, or the like, provides the location for 
connecting the monitoring appliance 1 50 to the network 899 for monitoring the network 
20 trafSc. 

As illustrated, the monitoring appliance 150 is preferably configured with two 
network inter&ce cards (NIC) 830 such as 3COM brand model 932 10/100 MHz Ethernet 
adapters or other adapters to match the network. However, it should be apparent to one 
skilled in the art that one or more cards can be utilized to accomplish the functions of the 

25 presently described dual card system. The monitor NIC 834 is typically set to a 

promiscuous mode or a similar fimction. The promiscuous mode is a mode of operation in 
which every data packet passing through the network device 135 will be received and 
read; An admin NIC 838 allows network interfacing and handles commands sent j5x)m the 
monitoring appliance 135. A NIC driver 820 enables the network traffic data to be 

30 exchanged with the processor 850. Other drivers 825 are utilized to interface or 
communicate with other devices including peripherals. These peripherals include 
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As discussed previously, the header data if each packet processed is read by the 
. packet classifier thread 510, Based on the source and destination IP addresses, the thread 
510 searches for an existing flow in the flow data structure 1 62, \s^ch is embodied as a 
data array in memory. A symmetric hash of the two IP addresses is used as the index into 
5 tihie array that points to the beginning of a two-way linked list of all flows with fliat hash 
value. 

Flow processing is done for TCP and UDP packets, and flie port numbers in flie 
transport layer heada: are used to identify the. flow record to be updated. For ICMP 
packets that constitute rejections of a packet, the copy of the rejected packet in the ICMP 
10 data field is -used to identify Ihe IP addresses and port numbers of the corresponding flow. 

A packet 101 is considered to match to a flow data structure record if both IP 
numbers match and: 

a) both port numbers match and ho port is marked as the *^servef' port, or 

b) the port nimiber previously niarked as the "server^ port inatch^^ 

15 . c) one of^e port numbers matches, but the other does not, and the neither port 

number has been marked as the server port (in this case the matching port number is 

mariced as the "server" port). 

If a new flow is determined, the yes branch of stqp 914 is foUo^wed by step 916. In 

step 9 1 6, a new flow record is created. If no flow exists that matches the current packet, a 
20 new flow record is started using the IP addresses and port numbers firom the current 

packet, and is linked to the end of the appropriate linked list of flow records. 

Hie IP address with the lower value, when considered as a 32-bit unsigned integer, 

is designated ip[0] and the corresponding port number is designated pt[0]. The higher IP 

address is designated ip[l] and the corresponding TCP or UDP port number is designated 
25 pt[l]. At some point, either pt [0] or pt[l] may be designated the "server'' port by setting a 

the appropriate bit in a bit map that is part of the flow record (record "state", bits 1 or 2 

set). 

Step 916 is followed by step 91 8, in which the flow records in the flow data 
structure 1 62 are updated. The time that the flow started, the packet capture time, is 
30 written into the record "start.*' The flow data structures updated by the packet classifier 
thread is discussed in detail in reference to FIG. 5. Step 918 is returned to step 912, in 
which the thread 510 determines if a new packet is available. 
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Step 945 is followed by step 946. In st^ 946, the flow record is written to the 
flow log file. Step 946 is followed by step 947. In step 947, the flow record is cleared 
fit)m the flow data structure. A&esr step 947, the thread is returned to step 942, in which 
the thread awaits for the requisite time. 

5 Referring next to FIG. 9C, the alarm manager thread 530 begins with step 972. In 

step 972, the thread 530 determines if a periodic time has elapsed. If the requisite time 
period has not elapsed, the no branch of step 972 is followed to step 972^ in vMch the 
thread 530 awaits the time to elapse. 

If &e tune has closed, the yes branch of step 972 is followed to step 973, in which 

10 the thread 530 performs concern index search. The alert manager thread 530 runs 
periodically (e.g., following the flow manager thread 520) and does a linear search 
through the host data structure 166. 

Step 973 is followed by step 974. In step 974, it compiles a number of lists tibat are 
written to various output files for lise by the user interface programs. For example, it 

15 collects an alert list ofhosts with CI above a certain threshold. This threshold may be 
adjusted so that the list is about 100 host records long. A user inter&ce program will 
preferably sort this list and display, in order of descending CI value, the top 60 hosts with 
high CI values. A sunilar list based on average byte mte over tiie last t^e interval (e.g., 5 
minutes) is also genoxited. If a range, or set ofranges,ofIP addresses have been defined 

20 by tiie network administrator as 'inside addresses," separate lists can be generated for 
•Inside" and "outside" hosts. Numerous oflier queries and reports 548 can be generated 
for review and analysis by the network administrator. The alert manager thread 530 writes 
the iqpdated data to various output files for use by the user interface, or for later off-line 
analysis. 

25 Step 974 is followed by step 975, in which the thread 530 determines if an alarm 

threshold has been exceeded. If the alarm threshold has not been exceeded, the no branch 
of step 975 is retumed to perform step 972. In step 972, tilie thread 530 determines if a 
requisite time period has elapsed. 

If an alarm threshold has been exceeded, the yes branch of step 975 is followed to 

30 step 976. In step 976, the alert manager thread generates certain predetennined signals 
designed to drawn the attention of a system administrator or other interested person. The 
alert manager 530 looks for hosts whose CI or traffic (byte rate) exceeds preset alarm 
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CLAIMS 

What is claimed is: 

1 . A method of analyzing networic commmiication traffic for potential intrusion activity, 
5 comprising the steps of: 

assigning packets to a flow; 

collecting flow data £rom packet headers; 

analyzing collected flow data to assign a concern index value to the flow based 
iqpon a probability that-flie flow was not normal for data commnnications; 
10 maintaining an accumulated concern index from flows associated with a host; and 

issuing an alarm signal once the accumulated concern index has exceeded an alarm 
threshold value. 

2. The method of claim 1, wherein the flow consists of tilie packets exchanged between 
15 two hosts that are associated with a single service. 

3. The method of claim 1, wherein the alarm signal iq>dates a firewall for filtering packets 
transmitted by a host. 

20 4. The method of claim 1, wherein the alarm signal genemtss a notification to the network 
administrator. 

5. The method of claim 1 , wherein each concem index value associated with a respective 
potential intrusion activity is' a predetermined fixed value. 

25 
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10. A system for analyzing network communication traffic, comprising: 

a computer system operable to classify packets into flows, collect flow data from 
packet header information, analyze collected flow data to assign a concern index value 
wherein each concem index value associated with a respective potential intrusion activity 
5 is a predetermined fixed value, and generate an alarm signal; and 

a communication system coupled to the computer system operable to send packets 
fix>m one host to another host 

11. A system for analyzing network communication traffic, comprismg: 

10 a processor opemble to classify packets into flows, collect flow data fix>m packet 

header infoimation, analyze collected flow data to assign a concem index value wherein 
each concem index value associated with a respective potential intrusion activity is a 
predetermined fixed value, and generate an alarm signal; 

m^ory coi:q)led to the processor operable to store iho flow data; 
15 a database coupled to processor op^ble to store log files; and 

and a network interface coupled to flie processor operable to monitor network ' 

traffic. 

12. A method of analyzing network communication traffic for potential intrusion activity, 
20 comprising the steps of: 

analyzing packet header information; 

determining a transport level protocol specifying a format of a data area ; 
issuing an alarm when the transport level protocol is identified as User Datagram 
Protocol and the data segment associated with User Datagram Protocol packet contains 
25 two or less bytes of data. 
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